Speech recognition for a robot under its motor noises by selective application of missing feature theory and MLLR
نویسندگان
چکیده
Automatic speech recognition (ASR) is essential for a robot to communicate with people. One of the main problems with ASR for robots is that robots inevitably generate motor noises. The noise is captured with strong power by the robot’s microphones, because the noise sources are closer to the microphones than the target speech source. The signal-to-noise ratio of input speech becomes quite low (less than 0 dB). However, it is possible to estimate the noise by using information on the robot’s own motions and postures, because a type of motion/gesture produces almost the same pattern of noise every time it is performed. This paper proposes a method to improve ASR under motor noises by using the information on the robot’s motion/gesture. The method selectively uses three techniques – multi-condition training, maximumlikelihood linear regression (MLLR), and missing feature theory (MFT). The former two techniques cope with the motor noises by selecting the noise-type-dependent acoustic model corresponding to a performing motion/gesture. The last technique extracts unreliable acoustic features in an input sound by matching the input with a pre-recorded noise of the current motion/gesture, and masks them in speech recognition to improve ASR performance. Because, in our method, ASR technique selection affects the systems performance, we evaluated the performance of three ASRs for each noise type of a robot’s motion/gesture to obtain the best technique selection rule. The preliminary results of isolated word recognition showed the effectiveness of our method using the obtained technique selection rule.
منابع مشابه
Leak energy based missing feature mask generation for ICA and GSS and its evaluation with simultaneous speech recognition
This paper addresses automatic speech recognition (ASR) for robots integrated with sound source separation (SSS) by using leak noise based missing feature mask generation. The missing feature theory (MFT) is a promising approach to improve noise-robustness of ASR. An issue in MFT-based ASR is automatic generation of the missing feature mask. To improve robot audition, we applied this theory to ...
متن کاملروشی جدید در بازشناسی مقاوم گفتار مبتنی بر دادگان مفقود با استفاده از شبکه عصبی دوسویه
Performance of speech recognition systems is greatly reduced when speech corrupted by noise. One common method for robust speech recognition systems is missing feature methods. In this way, the components in time - frequency representation of signal (Spectrogram) that present low signal to noise ratio (SNR), are tagged as missing and deleted then replaced by remained components and statistical ...
متن کاملSimultaneous Speech Recognition Based on Automatic Missing Feature Mask Generation by Integrating Sound Source Separation
Our goal is to realize a humanoid robot that has the capabilities of recognizing simultaneous speech. A humanoid robot under real-world environments usually hears a mixture of sounds, and thus three capabilities are essential for robot audition; sound source localization, separation, and recognition of separated sounds. In particular, an interface between sound source separation and speech reco...
متن کاملSpeaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کاملSpeaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation
A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...
متن کامل